home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Amiga CD-Sensation: Golden Games
/
Amiga CD-Sensation - Ausgabe 2 - Golden Games (1996)(GTI - Schatztruhe)(DE)[!].iso
/
Board & Card
/
AmiBoard_V1.0
/
PGN
/
Standard
< prev
next >
Wrap
Text File
|
1994-03-27
|
51KB
|
1,299 lines
PGN: Portable Game Notation Specification and Implementation Guide
Revised: 1993.12.19
Authors: Interested readers of the Internet newsgroup rec.games.chess
Coordinator: Steven J. Edwards (send comments to sje@world.std.com)
1: Introduction
PGN is "Portable Game Notation", a format designed for the representation of
chess game data using ASCII text files. PGN is structured to allow easy
reading and writing by human users and easy parsing and generation by computer
programs. The intent of the definition and propagation of PGN is to facilitate
the sharing of public domain chess game data among chessplayers (both organic
and otherwise), publishers, and computer chess researchers throughout the
world.
PGN is not intended to be a general purpose format that is suitable for every
possible use; no format could fill all conceivable requirements. Instead, PGN
is proposed as a universal portable format for data interchange. The idea is
to allow the construction of a family of chess applications can be implemented
such that they can read and write chess game data using PGN for import and
export among themselves.
2: Design philosophy
Computer usage among chessplayers has become quite common in recent years and a
number of programs, commercial and public domain, are used to access, generate,
and propagate chess game data. Some of these programs are rather impressive;
most are now well behaved in that they correctly follow the Laws of Chess and
handle users' data with reasonable care. Unfortunately, most programs have
serious problems with several aspects of the external representation of chess
game data. Sometimes these problems become more visible when a user attempts
to move significant quantities of data from one program to another; if there
has been no real effort to ensure portability of data, then the chances for a
successful transfer are small at best.
The reasons for format incompatibility are easy to understand. In fact, most
of them are correlated with the same problems that have already been seen with
commercial software offerings for other domains such as word processing,
spreadsheets, fonts, and graphics. Sometimes a manufacturer deliberately
designs a data format using encryption or some other secret, proprietary
technique to "lock in" a customer. Sometimes a designer may produce a format
that can be deciphered without too much difficulty, but at the same time
publicly discourage third party software by claiming trade secret protection.
Another software producer may develop a non-proprietary system, but it may work
well only within the scope of a single program or application because it is not
easily expandable. Finally, some other software may work very well for many
purposes, but it uses symbols and language not easily understood by people or
computers available to those outside the country of development.
Therefore, a specification for a portable game notation must observe the
lessons of history and be able to handle probable needs of the future. The
design criteria for PGN were selected to meet these needs. These criteria
include:
1) The details of the system must be publicly available and free of unnecessary
complexity. Ideally, if the documentation is not available for some reason, a
typical chess software developer or user should be able to understand the data
without the need for third party assistance.
2) The details of the system must be non-proprietary so that users and software
developers are unrestricted by concerns about infringing on intellectual
property rights. The idea is to let chess programmers compete in a free market
where customers may choose software based on their real needs and not
artificial needs created by a secret data format.
3) The system must work for a variety of programs. The format should be such
that it can be used by chess database programs, chess publishing programs,
chess server programs, and chessplaying programs without being unnecessarily
specific to any particular application class.
4) The system must be easily expandable and scalable. The expansion ability
must include handling data items that may not exist currently but could be
expected to emerge in the future. Examples: new opening classifications and
new country names. The system should be scalable in that it must not have any
arbitrary restrictions concerning the quantity of stored data. Also, planned
modes of expansion should either preserve earlier database or at least allow
for their automatic conversion.
5) The system must be international. Chess software users are found in many
countries and the system should be free of difficulties caused by conventions
local to a given region.
6) Finally, the system should handle the same kinds and amounts of data that
are already handled by existing chess software and by print media.
3: Formats: Import and export
There are two formats in the PGN specification. These are the "import" format
and the "export" format. There are two different ways of formatting the same
PGN data according to its source. The details of the two formats are described
throughout the following sections of this document.
3.1: Import format allows for manually prepared data
The import format is rather flexible and is used to describe data that may have
been prepared by hand, much like a source file for a high level programming
language. A program that can read PGN data should be able to handle the
somewhat lax import format.
3.2: Export format used for program generated output
The export format is rather strict and is used to describe data that is usually
prepared under program control, something like a pretty printed source program
reformatted by a compiler.
For a given PGN data file, export format representations generated by different
PGN programs on the same computing system should be exactly equivalent, byte
for byte.
Export format should also be used for archival storage. Here, "archival"
storage is defined as storage that may be accessed by a variety of computing
systems. The only extra requirement for archival storage is that the newline
character have a specific representation that is independent of its value for a
particular computing system's text file usage. The archival representation of
a newline is the ASCII control character LF (line feed, decimal value 10).
Several parts of the export format deal with exact descriptions of line and
field justification that are absent from the import format details. The main
reason for these restrictions on the export format are to allow the
construction of simple data translation programs that can easily scan PGN data
without having to have a full chess engine or other complex parsing routines.
The idea is to encourage chess software authors to always allow for at least a
limited PGN reading capability. Even when a full chess engine parsing
capability is available, it is likely to be at least two orders of magnitude
slower than a simple text scanner.
A PGN game represented using export format is said to be in reduced export
format if all of the following hold: 1) it has no commentary, 2) it has only
the standard seven tag roster identification information (see below), 3) it has
no recursive annotation variations (see below), and 4) it has no numeric
annotation glyphs (see below). Reduced export format is used for bulk storage
of unannotated games. It represents a minimum level of standard conformance
for a PGN exporting application.
4: Lexicographical issues
PGN data is composed of characters; these form in turn form lexical tokens.
4.1: Character codes
PGN data is represented using only the ASCII character set with character codes
restricted to those with decimal numeric values of less than 128. Furthermore,
only printable characters with codes from 32 to 127 are used along with the
newline character and the horizontal and vertical tab characters. The external
representation of the newline character may differ among platforms; this is an
acceptable variation as long as the details of the implementation are hidden
from software implementors and users.
4.2: Tab characters
Tab characters, both horizontal and vertical, are not permitted in the export
format. This is because the treatment of tab characters is highly dependent
upon the particular software in use on the host computing system.
4.3: Line lengths
PGN data are organized as simple text lines without any special bytes or
controls for secondary record structure imposed by specific operating systems.
Import format PGN text lines are limited to having a maximum of 255 characters
per line including the newline character. Lines with 80 or more printing
characters are strongly discouraged because of the difficulties experienced by
common text editors with long lines. Export format text lines are limited to
having fewer than 80 characters per line. These limits are chosen to
facilitate ease of implementation and ease of viewing. Also, some systems
require explicit text file line record length limits. Sad, but true.
5: Commentary
Comment text may appear in PGN data. There are two types of comments. The
first type is the "rest of line" comment; this comment type starts with a
semicolon character and continues to the end of the line. The second type
starts with a left brace character and continues to the next right brace
character. Brace comments do not nest. A semicolon appearing inside of a
brace comment loses its special meaning and is ignored. Braces appearing
inside of a semicolon comments lose their special meaning and are ignored.
*** Export format representation of comments needs definition work.
6: Escape mechanism
There is a special escape mechanism for PGN data. This mechanism is triggered
by a percent sign character appearing in the first column of a line; the data
on the rest of the line is ignored by publicly available PGN scanning software.
This escape convention is intended for the private use of software developers
and researchers to embed commands and data in PGN streams.
7: Tokens
PGN character data is organized as tokens. A token is a contiguous sequence of
characters that represents a basic semantic unit. Tokens may be separated from
adjacent tokens by whitespace characters. Some tokens are self delimiting and
do not require whitespace characters.
A string token is a sequence of zero or more characters that is delimited by a
pair of quote characters (ASCII value 22). An empty string is represented by
two adjacent quotes. (Note: an apostrophe is not a quote.) A quote inside a
string is represented by the backslash immediately followed by a quote. A
backslash inside a string is represented by two adjacent backslashes. Strings
are commonly used as tag pair values (see below).
An integer token is a sequence of one or more decimal digit characters. It is
a special case of the more general "symbol" token class described below.
Integer tokens are used to help represent move number indications (see below).
A period character is a token by itself. It is used for move number
indications (see below).
An asterisk character is a token by itself. It is used as one of the possible
game termination markers (see below); it indicates an incomplete game or a game
with an unknown or otherwise unavailable result.
The left and right bracket characters are tokens. They are used to delimit tag
pairs (see below).
The left and right parenthesis characters are tokens. They are used to delimit
Recursive Annotation Variations (see below).
A Numeric Annotation Glyph ("NAG", see below) is a token; it is composed of a
dollar sign character immediately followed by one or more digit characters.
A symbol token starts with a letter or digit character and is immediately
followed by a sequence of zero or more symbol continuation characters. These
continuation characters are letter characters, digit characters, the
underscore, the plus sign, the octothorpe sign (i.e., pound sign; also known as
the tic-tac-toe sign or the number sign), the equal sign, and the hyphen.
Symbols are used for a variety of purposes. All characters in a symbol are
significant.
8: Parsing games
The basic element of a database composed of games data using the PGN format is
the single PGN chess game. A PGN database file is a sequential collection of
zero or more PGN games. An empty file is a valid, although somewhat
uninformative, PGN database.
A PGN game is composed of two major sections. The first is the tag pair
section and the second is the movetext section. The tag pair section provides
information that identifies the game by defining the values associated with a
set of standard parameters. The movetext section gives the enumerated and
possibly annotated moves of the game along with a concluding game termination
marker. The chess moves themselves are represented using SAN (Standard
Algebraic Notation), also described later in this document.
8.1: Tag pair section
The tag pair section is composed of a series of zero or more tag pairs.
A tag pair is composed of four consecutive tokens: a left bracket token, a
symbol token, a string token, and a right bracket token. The symbol token is
the tag name and the string token is the tag value associated with the tag
name. (There is a standard set of tag names and semantics described below.)
The same tag name should not appear more than once in a tag pair section.
A further restriction on tag names is that they are composed exclusively of
letters, digits, and the underscore character. This is done to facilitate
mapping of tag names into third party database programs.
For PGN import format, there may be zero or more whitespace characters between
any adjacent pair of tokens in a tag pair.
For PGN export format, there are no whitespace characters between the left
bracket and the tag name, there are no whitespace characters between the tag
value and the right bracket, and there is a single space character between the
tag name and the tag value.
Tag names, like all symbols, are case sensitive. All tag names used for
archival storage begin with an upper case letter.
PGN import format may have multiple tag pairs on the same line and may even
have a tag pair spanning more than a single line. Export format requires each
tag pair to appear left justified on a line by itself; a single empty line
follows the last tag pair. Note that this requirement places a length limit
for the entire tag pair because of the restriction of fewer than 80 characters
per line. Specifically, the sum of the character length of the tag name and
the tag value should be less than 75.
Some tag values may be composed of a sequence of items. For example, a
consultation game may have more than one player for a given side. When this
occurs, the single character ":" (colon) appears between adjacent items.
The tag pair format is designed for expansion; initially only strings are
allowed as tag pair values. In future revisions, this will be expanded to a
general list structure as needed. This will also allow multi-line tag values
at the same time.
8.1.1: Seven Tag Roster
There is a set of tags defined for mandatory use for archival storage of PGN
data. This is the STR (Seven Tag Roster). The interpretation of these tags is
fixed as is the order in which they appear. Although other tag names and
semantics are permitted and encouraged, the STR is the common ground that all
programs should follow for public data interchange.
For import format, the order of tag pairs is not important. For export format,
the STR tags appear before any other tag pairs. (The STR tag pair must also
appear in order; this order is described below). Also for export format, the
additional tag pairs appear in ASCII order by tag name.
The seven tag names of the STR are (in order):
1) Event (the name of the tournament or match event)
2) Site (the location of the event)
3) Date (the starting date of the game)
4) Round (the playing round ordinal of the game)
5) White (the player of the white pieces)
6) Black (the player of the black pieces)
7) Result (the result of the game)
A set of supplemental tag names is given in Appendix A of this document.
For PGN export format, a single blank line appears after the last of the tag
pairs to conclude the tag pair section. This helps simple scanning programs to
quickly determine the end of the tag pair section and the beginning of the
movetext section.
8.1.1.1: The Event tag
The Event tag value should be reasonably descriptive. Abbreviations are to be
avoided unless absolutely necessary to save space. A consistent event naming
should be used to help facilitate database scanning. If the name of the event
is unknown, a single question mark should appear as the tag value.
Examples:
[Event "FIDE World Championship"]
[Event "Moscow City Championship"]
[Event "ACM North American Computer Championship"]
8.1.1.2: The Site tag
The Site tag value should include city and region names along with a standard
name for the country. The use of the International Olympic Committee three
letter names is suggested for those countries where such codes are available.
If the site of the event is unknown, a single question mark should appear as
the tag value.
Examples:
[Site "New York City, NY USA"]
[Site "St. Petersburg RUS"]
[Site "Riga LAT"]
8.1.1.3: The Date tag
The Date tag value gives the starting date for the game. (Note: this is not
necessarily the same as the starting date for the event.) The Date tag value
field always uses a standard ten character format: "YYYY.MM.DD". The first
four characters are digits that give the year, the next character is a period,
the next two characters are digits that give the month, the next character is a
period, and the final two characters are digits that give the day of the month.
If the any of the digit fields are not known, then question marks are used in
place of the digits.
Examples:
[Date "1992.08.31"]
[Date "1993.??.??"]
[Date "2001.01.01"]
8.1.1.4: The Round tag
The Round tag value gives the playing round for the game. In a match
competition, this value is the number of the game played. In a simultaneous
exhibition, this is the board number. If the use of a round number is
inappropriate, then the field should be a single hyphen character. If the
round is unknown, a single question mark should appear as the tag value.
Some organizers employ unusual round designations and have multipart playing
rounds and sometimes even have conditional rounds. In these cases, a multipart
round identifier can be made from a sequence of integer round numbers separated
by periods. The leftmost integer represents the most significant round and
succeeding integers represent round numbers in decending hierarchical order.
Examples:
[Round "1"]
[Round "3.1"]
[Round "4.1.2"]
8.1.1.5: The White tag
The White tag value is the name of the player or players of the white pieces.
The names are given as they would appear in a telephone directory. The family
or last name appears first. If a first name or first initial is available, it
is separated from the family name by a comma and a space. Finally, one or more
middle initials may appear. If the name is unknown, a single question mark
should appear as the tag value.
The intent is to allow meaningful ASCII sorting of the tag value that is
independent of regional name formation customs. If more than one person is
playing the white pieces, the names are listed in alphabetical order and are
separated by the colon character between adjacent entries. A player who is
also a computer program should have appropriate version information listed
after the name of the program.
The format used in the FIDE Rating Lists is appropriate for use for player name
tags.
Examples:
[White "Tal, Mikhail N."]
[White "van der Wiel, Johan"]
[White "Acme Pawngrabber v.3.2"]
8.1.1.6: The Black tag
The Black tag value is the name of the player or players of the black pieces.
The names are given here as they are for the White tag value.
Examples:
[Black "Lasker, Emmanuel"]
[Black "Smyslov, Vasily V."]
[Black "KingHunter IV:Smith, John Q.:Woodpusher 2000"]
8.1.1.7: The Result tag
The Result field value is the result of the game. It is always exactly the
same as the game termination marker that concludes the associated movetext. It
is always one of four possible values: "1-0" (White wins), "0-1" (Black wins),
"1/2-1/2" (drawn game), and "*" (game still in progress, game abandoned, or
result otherwise unknown). Note that the digit zero is used in both of the
first two cases; not the letter "O".
All possible examples:
[Result "0-1"]
[Result "1-0"]
[Result "1/2-1/2"]
[Result "*"]
8.2: Movetext section
The movetext section is composed of movetext elements. These elements are:
chess moves, move number indications, optional annotations, and a single
concluding game termination marker.
Because illegal moves are not real chess moves, they are not permitted in PGN
movetext. They may appear in commentary, however. One would hope that illegal
moves are relatively rare in games worthy of recording.
8.2.1: Movetext line justification
In PGN import format, elements in the movetext do not require any specific line
justification.
In PGN export format, elements in the movetext are placed left justified on
successive text lines each of which has less than 80 printing characters. As
many elements as possible are placed on a line with the remainder appearing on
successive lines. A single space character appears between any two adjacent
elements on the same line in the movetext. As with the tag pair section, a
single empty line follows the last line of data to conclude the movetext
section.
8.2.2: Movetext move number indications
A move number indication is composed of one or more adjacent digits (an integer
token) followed by zero or more periods. The integer portion of the indication
gives the move number of the immediately following white move (if present) and
also the immediately following black move (if present).
8.2.2.1: Import format move number indications
PGN import format does not require move number indications. It does not
prohibit superfluous move number indications anywhere in the movetext as long
as the move numbers are correct.
PGN import format move number indications may have zero or more period
characters following the digit sequence that gives the move number; one or more
whitespace characters may appear between the digit sequence and the period(s).
8.2.2.2: Export format move number indications
Export format requires a move number indication immediately prior to each white
move and nowhere else. Specifically, a move number indication does not appear
immediately prior to a game termination marker.
Export format has exactly one period character immediately following the digit
sequence; this forms a single movetext element.
8.2.3: Movetext SAN (Standard Algebraic Notation)
SAN (Standard Algebraic Notation) is a representation standard for chess moves
using the ASCII Latin alphabet.
Examples of SAN recorded games are found throughout most modern chess
publications. SAN as presented in this document uses English language single
character abbreviations for chess pieces, although this is easily changed in
the source. English is chosen over other languages because it appears to be
the most widely recognized.
An alternative to SAN is FAN (Figurine Algebraic Notation). FAN uses miniature
piece icons instead of single letter piece abbreviations. The two notations
are otherwise identical.
Details about SAN construction are given in the FIDE Laws of Chess and are also
described in the following sections of this document.
8.2.3.1: Square identification
SAN identifies each of the sixty four squares on the chessboard with a unique
two character name. The first character of a square identifier is the file of
the square; a file is a column of eight squares designated by a single lower
case letter from "a" (left most or queenside) up to and including "h" (right
most or kingside). The second character of a square identifier is the rank of
the square; a rank is a row of eight squares designated by a single digit from
"1" (bottom most [White's first rank]) up to and including "8" (top most
[Black's first rank]). The initial squares of some pieces are: white queen
rook at a1, white king at e1, black queen knight pawn at b7, and black king
rook at h8.
8.2.3.2: Piece identification
SAN identifies each piece by a single upper case letter. The standard English
values: pawn = "P", knight = "N", bishop = "B", rook = "R", queen = "Q", and
king = "K".
The letter code for a pawn is not used for SAN moves in PGN output movetext.
However, some PGN import software disambiguation code may allow for the
appearence of pawn letter codes. Also, there is the possibility of using pawn
and other piece letter codes in tag pair and annotation constructs to be
defined in the future.
It is admittedly a bit chauvinistic to select English piece letters over those
from other languages. There is a slight justification in that English is a de
facto universal second language among most chessplayers and software users and
authors. It is probably the best that can be done for now. Appendix I of this
document gives alternative piece letters, but these should be used only for
local presentation software and not for archival storage or for dynamic
interchange among programs.
8.2.3.3: Basic SAN move construction
A basic SAN move is given by listing the moving piece letter (omitted for
pawns) followed by the destination square. Capture moves are denoted by the
lower case letter "x" immediately prior to the destination square; pawn
captures include the file letter of the originating square of the capturing
pawn immediately prior to the "x" character.
SAN kingside castling is indicated by the sequence "O-O"; queenside castling is
indicated by the sequence "O-O-O". Note that the upper case letter "O" is
used, not the digit zero. The use of a zero character is not only incompatible
with traditional text practices, but it can also confuse parsing software which
also has to understand about move numbers and game termination markers.
En passant captures do not have any special notation; they are formed as if the
captured pawn were on the capturing pawn's destination square. Pawn promotions
are denoted by the equal sign "=" immediately following the destination square
with a promoted piece letter (indicating one of knight, bishop, rook, or queen)
immediately following the equal sign. As above, the piece letter is in upper
case.
In the case of ambiguities (multiple pieces of the same type moving to the same
square), the first appropriate disambiguating step of the three following steps
is taken: First, if the moving pieces can be distinguished by their
originating files, the originating file letter of the moving piece is inserted
immediately after the moving piece letter. Second (when the first step fails),
if the moving pieces can be distinguished by their originating ranks, the
originating rank digit of the moving piece is inserted immediately after the
moving piece letter. Third (when both the first and the second steps fail),
the two character square coordinate of the originating square of the moving
piece is inserted immediately after the moving piece letter. The result of the
SAN actions described so for is called "the basic SAN move notation".
8.2.3.4: Check and checkmate indication characters
If the move is a checking move, the plus sign "+" is appended as a suffix to
the basic SAN notation; if the move is a checkmating move, the octothorpe sign
"#" is appended instead. Neither the appearance nor the absence of either a
check or checkmating indicator is used for disambiguation purposes.
There are no special markings used for double checks or discovered checks.
8.2.3.5: SAN move length
SAN moves can be as short as two characters (e.g., "d4"), or as long as seven
characters (e.g., "Qa6xb7#"). The average SAN move length seen in realistic
games is probably just fractionally longer than three characters. If the SAN
rules seem complicated, be assured that the earlier notation systems of LEN
(Long English Notation) and EDN (English Descriptive Notation) are much more
complex, and that LAN (Long Algebraic Notation, the predecessor of SAN) is
unnecessarily bulky.
8.2.3.6: Import and export SAN
PGN export format always uses the above canonical SAN to represent moves in the
movetext section of a PGN game. Import format is somewhat more relaxed and it
makes allowances for moves that do not conform exactly to the canonical format.
However, the allowances may differ among different PGN reader software. Only
data appearing in export format is in all cases guaranteed to be importable
into all PGN readers.
There are a number of suggested guidelines for use with implementing PGN reader
software for permitting non-canonical SAN move representation. The idea is to
have a PGN reader apply various transformations to attempt to discover the move
that is represented by non-canonical input. Some suggested transformations
include: letter case remapping, capture indicator insertion, check indicator
insertion, and checkmate indicator insertion.
8.2.4: Movetext NAG (Numeric Annotation Glyph)
An NAG (Numeric Annotation Glyph) is a movetext element that is used to
indicate a simple annotation in a language independent manner. An NAG always
annotates the immediately preceding move.
*** The NAG "$0" is defined to be the null annotation. Additional NAGs are to
be defined later. Also, it may be useful to extend NAG usage to include
operands other than or in addition to the immediately preceding move.
8.2.5: Movetext RAV (Recursive Annotation Variation)
An RAV (Recursive Annotation Variation) is a sequence of movetext containing
zero or more moves enclosed in parentheses. An RAV is used to represent an
alternative variation. The alternate move sequence given by an RAV is one that
may be legally played by first unplaying the move that appears immediately
prior to the RAV. Because the RAV is a recursive construct, it may be nested.
*** The specification for import/export representation of RAV elements needs
further development.
Appendix A: Supplemental tag names
The following tag names and their associated semantics are recommended for use
for information not contained in the Seven Tag Roster.
A.1: Player related information
WhiteTitle, BlackTitle: String values such as "FM", "IM", and "GM"; these tags
are used only for the standard abbreviations for FIDE titles.
WhiteElo, BlackElo: Integer values; these are used for FIDE Elo ratings.
WhiteUSCF, BlackUSCF: Integer values; these are used for USCF (United States
Chess Federation) ratings. Similar tag names can be constructed for other
rating agencies.
A.2: Event related information
EventDate: A date value, similar to the Date tag field, that gives the starting
date of the Event.
EventSponsor: A string value giving the name of the sponsor of the event.
Section: A string; this is used for the playing section of a tournament (e.g.,
"Open" or "Reserve").
Stage: A string; this is used for the stage of a multistage event (e.g.,
"Preliminary" or "Semifinal").
Board: An integer; this identifies the board number in a team event.
A.3: Opening information
Opening: A string; this is used for the traditional opening name. This will
vary by locale.
Variation: A string; this is used to further refine the Opening tag. This will
vary by locale.
SubVariation: A string; this is used to further refine the Variation tag. This
will vary by locale.
ECO: String of the form "XDD/DD" where the "X" is a letter from "A" to "E" and
the "D" positions are digits; this is used for an opening designation from the
five volume _Encyclopedia of Chess Openings_.
NIC: A string; this is used for an opening designation from the _New in Chess_
database.
A.4: Miscellaneous
Annotator: A name or names in the format of the player name tags; this
identifies the annotator of the game.
Time: A time-of-day value in the form "HH:MM:SS"; similar to the Date tag
except that it denotes the local clock time (hours, minutes, and seconds) of
the start of the game. Note that colons, not periods, are used for internal
separators for the Time value.
Appendix B: Numeric Annotation Glyphs
*** TBD
Appendix C: File names and directories
File names chosen for PGN data should be both informative and portable. The
directory names and arrangements should also be chosen for the same reasons and
also for ease of navigation.
Some of suggested file and directory names may be difficult or impossible to
represent on certain computing systems. Use of appropriate conversion customs
is encouraged.
C.1: File name suffix for PGN data
The use of the file suffix ".pgn" is encouraged for ASCII text files containing
PGN data.
C.2: File name formation for PGN data for a specific player
PGN games for a specific player should have a file name consisting of the
player's last name followed by the ".pgn" suffix.
C.3: File name formation for PGN data for a specific event
PGN games for a specific event should have a file name consisting of the
event's name followed by the ".pgn" suffix.
C.4: File name formation for PGN data for chronologically ordered games
PGN data files used for chronologically ordered (oldest first) archives use
date information as file name root strings. A file containing all the PGN
games for a given year would have an eight character name in the format
"YYYY.pgn". A file containing PGN data for a given month would have a ten
character name in the format "YYYYMM.pgn". Finally, a file for PGN games for a
single day would have a twelve character name in the format "YYYYMMDD.pgn".
Large files are split into smaller files as needed.
As game files are commonly arranged by chronological order, games with missing
or incomplete Date tag pair data are to be avoided. Any question mark
characters in a Date tag value will be treated as zero digits for collation
within a file and also for file naming.
Large quantities of PGN data arranged by chronological order should be
organized into hierarchical directories. A directory containing all PGN data
for a given year would have a four character name in the format "YYYY";
directories containing PGN files for a given month would have a six character
name in the format "YYYYMM".
C.5: A suggested directory tree
A suggested directory arrangement for ftp sites and CD-ROM distributions:
* PGN: master directory of the PGN subtree (e.g., pub/chess/PGN)
* PGN/ReadMe: brief description of the local directory contents
* PGN/Standard: the PGN standard (this document)
* PGN/News: news and status of the entire PGN subtree
* PGN/Tools: software utilities that access PGN data
* PGN/Players: directory of PGN files, each for a specific player
* PGN/Players/ReadMe: brief description of the local directory contents
* PGN/Players/News: news and status of the player collection
* PGN/Events: directory of PGN files, each for a specific event
* PGN/Events/ReadMe: brief description of the local directory contents
* PGN/Events/News: news and status of the event collection
* PGN/MGR: directory of the Master Games Repository subtree
* PGN/MGR/ReadMe: brief description of the local directory contents
* PGN/MGR/News: news and status of the entire PGN/MGR subtree
* PGN/MGR/YYYY: directory of games or subtrees for the year YYYY
* PGN/MGR/YYYY/ReadMe: description of local directory for year YYYY
* PGN/MGR/YYYY/News: news and status for year YYYY data
Appendix D: PGN collating sequence
There is a standard sorting order for PGN games within a file. This collation
is based on eight keys; these are the seven tag values of the STF and also the
movetext itself.
The first (most important, primary key) is the Date tag. Earlier dated games
appear prior to games played at a later date. This field is sorted by
ascending numeric value first with the year, then the month, and finally the
day of the month. Query characters used for unknown date digit values will be
treated as zero digit characters for ordering comparison.
The second key is the Event tag. This is sorted in ascending ASCII order.
The third key is the Site tag. This is sorted in ascending ASCII order.
The fourth key is the Round tag. This is sorted in ascending numeric order
based on the value of the integer used to denote the playing round. A query or
hyphen used for the round is ordered before any integer value. A query
character is ordered before a hyphen character.
The fifth key is the White tag. This is sorted in ascending ASCII order.
The sixth key is the Black tag. This is sorted in ascending ASCII order.
The seventh key is the Result tag. This is sorted in ascending ASCII order.
The eighth key is the movetext itself. This is sorted in ascending ASCII order
with the entire text including spaces and newline characters.
Appendix E: PGN software
This appendix describes some PGN software that is currently available. The
entries are presented in rough chronological order of their initial
availability. Authors of PGN capable software are encouraged to contact the
PGN standard coordinator (e-mail address listed near the start of this
document) so that the information may be included here in this section.
Some PGN software is freeware and can be gotten from ftp sites and other
sources. Other PGN software is payware and appears as part of commercial
chessplaying programs and chess database managers. Those who are interested in
the propagation of the PGN standard are encouraged to support manufacturers of
chess software that use the standard. If a particular vendor does not offer
PGN compatibility, it is likely that a few letters to them along with a copy of
this specification may help them decide to include PGN support in their next
release.
The staff at the University of Oklahoma at Norman (USA) have graciously
provided an ftp site (chess.uoknor.edu) for the storage of chess related data
and programs. Because file names change over time, those accessing the site
are encouraged to first retrieve the file "pub/chess/ls-lR.gz" for a current
listing. A scan of this listing will also help locate versions of PGN programs
for machine types and operating systems other than those listed below.
E.1: The SAN Kit
The SAN Kit is an ANSI C source chess programming toolkit available for free
from the ftp site chess.uoknor.edu in the directory pub/chess/Unix as the file
"SAN.tar.gz" (a gzip tar archive). This kit contains code for PGN import and
export and can be used to "regularize" PGN data into reduced export format by
use of its "tfgg" command. Code from this kit is freely redistributable for
anyone as long as future distribution is unhindered for everyone. The SAN Kit
is undergoing continuous development, although dates of future deliveries are
quite difficult to predict. Suggestions and comments should be directed to its
author, Steven J. Edwards (sje@world.std.com).
E.2: pgnRead
The program pgnRead runs under MS Windows 3.1 and provides an interactive
graphical user interface for scanning PGN data files. This program includes a
colorful figurine chessboard display and scrolling controls for game and game
text selection. It is available from the chess.uoknor.edu ftp site in the
pub/chess/DOS directory; several versions are available with names of the form
"pgnrd**.exe"; the latest at this writing is "pgnrd121.exe". Suggestions and
comments should be directed to its author, Keith Fuller (keithfx@aol.com).
E.3: mail2pgn/GIICS
The program mail2pgn produces a PGN version of chess game data generated by the
ICS (Internet Chess Server). It can be found at the chess.uoknor.edu ftp site
in the pub/chess/DOS directory as the file "mail2pgn.zip" A C language version
is in the directory pub/chessUnix as the file "mail2pgn.c". Suggestions and
comments should be directed to its author, John Aronson
(aronson@helios.ece.arizona.edu). This code has been reportedly incorporated
into the GIICS (Graphical Interface for the ICS); suggestions and comments
should be directed to its author, Tony Acero (ace3@midway.uchicago.edu).
E.4: XBoard
XBoard is a comprehensive chess utility running under the X Window system that
provides a graphical user interface in a portable manner. A new version now
handles PGN data. It is available from the chess.uoknor.edu ftp site in the
pub/chess/X directory as the file "xboard-3.0.pl9.tar.gz". Suggestions and
comments should be directed to its author, Tim Mann (mann@src.dec.com).
E.5: cupgn
The program "cupgn" converts game data stored in the ChessBase format into PGN.
It is available from the chess.uoknor.edu ftp site in the
pub/chess/Game-Databases/CBUFF directory as the file "cupgn.tar.gz". Another
version is in the directory pub/chess/DOS as the file "cupgn120.exe".
Suggestions and comments should be directed to its author, Anjo Anjewierden
(anjo@swi.psy.uva.nl).
E.6: Rumors
There are unofficial reports that the current or future versions of Chess
Assistant, BookUp8, HIARCS, and Zarkov will have some degree of PGN
compatibility.
Appendix F: PGN data archives
The primary PGN data archive repository is located at the ftp site
chess.uoknor.edu as the directory "pub/chess/PGN". It is organized according
to the description given in section C.5 of this document.
Appendix G: International Olympic Committee country codes
International Olympic Committee country codes are employed for Site nation
information because of their traditional use with the reporting of
international sporting events. Due to changes in geography and linguistic
custom, some of the following may be incorrect or outdated. Corrections and
extensions should be sent via e-mail to the PGN coordinator address listed near
the start of this document.
AFG: Afghanistan
ALB: Albania
ALG: Algeria
AND: Andorra
ANG: Angola
ANT: Antigua
ARG: Argentina
ARM: Armenia
AUS: Australia
AZB: Azerbaijan
BAN: Bangladesh
BAR: Bahrain
BHM: Bahamas
BEL: Belgium
BER: Bermuda
BIH: Bosnia and Herzegovina
BLA: Belarus
BLG: Bulgaria
BLZ: Belize
BOL: Bolivia
BRB: Barbados
BRS: Brazil
BRU: Brunei
BSW: Botswana
CAN: Canada
CHI: Chile
COL: Columbia
CRA: Costa Rica
CRO: Croatia
CSR: Czechoslovakia
CUB: Cuba
CYP: Cyprus
DEN: Denmark
DOM: Dominican Republic
ECU: Ecuador
EGY: Egypt
ENG: England
ESP: Spain
EST: Estonia
FAI: Faroe Islands
FIJ: Fiji
FIN: Finland
FRA: France
GAM: Gambia
GCI: Guernsey-Jersey
GEO: Georgia
GER: Germany
GHA: Ghana
GRC: Greece
GUA: Guatemala
GUY: Guyana
HAI: Haiti
HKG: Hong Kong
HON: Honduras
HUN: Hungary
IND: India
IRL: Ireland
IRN: Iran
IRQ: Iraq
ISD: Iceland
ISR: Israel
ITA: Italy
IVO: Ivory Coast
JAM: Jamaica
JAP: Japan
JRD: Jordan
JUG: Yugoslavia
KAZ: Kazakhstan
KEN: Kenya
KIR: Kyrgyzstan
KUW: Kuwait
LAT: Latvia
LEB: Lebanon
LIB: Libya
LIC: Liechtenstein
LTU: Lithuania
LUX: Luxembourg
MAL: Malaysia
MAU: Mauritania
MEX: Mexico
MLI: Mali
MLT: Malta
MNC: Monaco
MOL: Moldova
MON: Mongolia
MOZ: Mozambique
MRC: Morocco
MRT: Mauritius
MYN: Myanmar
NCG: Nicaragua
NET: The Internet
NIG: Nigeria
NLA: Netherlands Antilles
NLD: Netherlands
NOR: Norway
NZD: New Zealand
OST: Austria
PAK: Pakistan
PAL: Palestine
PAN: Panama
PAR: Paraguay
PER: Peru
PHI: Philippines
PNG: Papua New Guinea
POL: Poland
POR: Portugal
PRC: People's Republic of China
PRO: Puerto Rico
QTR: Qatar
RIN: Indonesia
ROM: Romania
RUS: Russia
SAF: South Africa
SAL: El Salvador
SCO: Scotland
SEN: Senagal
SEY: Seychelles
SIP: Singapore
SLV: Slovenia
SMA: San Marino
SRI: Sri Lanka
SUD: Sudan
SUI: Switzerland
SUR: Surinam
SVE: Sweden
SWE: Sweden
SWZ: Switzerland
SYR: Syria
TAI: Thailand
TMT: Turkmenistan
TRK: Turkey
TTO: Trinidad and Tobago
TUN: Tunisia
UAE: United Arab Emirates
UGA: Uganda
UKR: Ukraine
URU: Uruguay
USA: United States of America
UZB: Uzbekistan
VEN: Venezuela
VGB: British Virgin Islands
VIE: Vietnam
VUS: U.S. Virgin Islands
WLS: Wales
YEM: Yemen
YUG: Yugoslavia
ZAM: Zambia
ZIM: Zimbabwe
ZRE: Zaire
Appendix H: Additional chess data standards
While PGN is used for game storage, there are other data representation
standards for other chess related purposes.
H.1: FEN
FEN is "Forsyth-Edwards Notation"; it is a standard for describing chess
positions using the ASCII character set.
H.1.1: History
FEN is based on a 19th century standard for position recording designed by the
Scotsman John Forsyth, a newspaper journalist. The standard has been slightly
extended for use with chess software by Steven Edwards with assistance from
commentators on the Internet.
H.1.2: Uses for a position notation
Having a standard position notation is particularly important for chess
programmers as it allows them to share position databases. For example, there
exist standard position notation databases with many of the classical benchmark
tests for chessplaying programs, and by using a common position notation format
many hours of tedious data entry can be saved. Additionally, a position
notation can be useful for page layout programs and for confirming position
status for e-mail competition.
Many interesting chess problem sets represented with FEN can be found at the
chess.uoknor.edu ftp site in the directory pub/chess/SAN_testsuites.
H.1.3: Data fields
FEN specifies the piece placement, the active color, the castling availability,
the en passant target square, the halfmove clock, and the fullmove number.
These can all fit on a single text line in an easily read format. The length
of a FEN position description varies somewhat according to the position. In
some cases, the description could be eighty or more characters in length and so
may not fit conveniently on some displays. However, these positions aren't too
common.
A FEN description has six fields. Each field is composed only of nonblank
printing ASCII characters. Adjacent fields are separated by a single ASCII
space character.
H.1.3.1: Piece placement data
The first field represents the placement of the pieces on the board. The board
contents are specified starting with the eighth rank and ending with the first
rank. For each rank, the squares are specified from file a to file h. White
pieces are identified by uppercase SAN piece letters ("PNBRQK") and black
pieces are identified by lowercase SAN piece letters ("pnbrqk"). Empty squares
are represented by the digits one through eight; the digit used represents the
count of contiguous empty squares. A solidus character "/" is used to separate
data of adjacent ranks.
H.1.3.2: Active color
The second field represents the active color. A lower case "w" is used if
White is to move; a lower case "b" is used if Black is the active player.
H.1.3.3: Castling availability
The third field represents castling availability. This indicates potential
future castling that may not be possible at the moment due to blocking pieces
or enemy attacks. If there is no castling availability for either side, the
single character symbol "-" is used. Otherwise, a combination of from one to
four characters are present. If White has kingside castling availability, the
uppercase letter "K" appears. If White has queenside castling availability,
the uppercase letter "Q" appears. If Black has kingside castling availability,
the lowercase letter "k" appears. If Black has queenside castling
availability, then the lowercase letter "q" appears. Those letters which
appear will be ordered first uppercase before lowercase and second kingside
before queenside. There is no whitespace between the letters.
H.1.3.4: En passant target square
The fourth field is the en passant target square. If there is no en passant
target square then the single character symbol "-" appears. If there is an en
passant target square then is represented by a lowercase file character
immediately followed by a rank digit. Obviously, the rank digit will be "3"
following a white pawn double advance (Black is the active color) or else be
the digit "6" after a black pawn double advance (White being the active color).
H.1.3.5: Halfmove clock
The fifth field is a nonnegative integer representing the halfmove clock. This
number is the count of halfmoves (or ply) since the last pawn advance or
capturing move. This value is used for the fifty move draw rule.
H.1.3.6: Fullmove number
The sixth and last field is a positive integer that gives the fullmove number.
This will have the value "1" for the first move of a game for both White and
Black. It increments by one immediately after each move by Black.
H.1.4: Examples
Here's the FEN for the starting position:
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
And after the move 1. e4:
rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1
And then after 1. ... c5:
rnbqkbnr/pp1ppppp/8/2p5/4P3/8/PPPP1PPP/RNBQKBNR w KQkq c6 0 2
And then after 2. Nf3:
rnbqkbnr/pp1ppppp/8/2p5/4P3/5N2/PPPP1PPP/RNBQKB1R b KQkq - 1 2
For two kings on their home squares and a white pawn on e2 (White to move) with
thirty eight full moves played with five halfmoves since the last pawn move or
capture:
4k3/8/8/8/8/8/4P3/4K3 w - - 5 39
H.2 EPD
EPD is "Extended Position Notation"; it is a standard for describing chess
positions along with an extended set of structured attribute values using the
ASCII character set. It is intended for computer use for data intechange among
chessplaying programs. It is also intended for the representation of portable
opening library repositories. A specification for EPD is currently under
development.
Appendix I: Alternative chesspiece identifier letters
English language piece names are used to define the letter set for identifying
chesspieces in PGN movetext. However, authors of software that is used only
for local presentation or scanning of chess move data may find it convenient to
use piece letter codes common in their locales. This is not a problem as long
as PGN data that resides in archival storage or that is exchanged among
programs still uses the standard English piece letter codes: "PNBRQK".
For the above authors only, a list of alternative piece letter codes are
provided:
Language Piece letters (pawn knight bishop rook queen king)
---------- --------------------------------------------------
Czech P J S V D K
Danish B S L T D K
Dutch O P L T D K
English P N B R Q K
Estonian P R O V L K
Finnish P R L T D K
French P C F T D R
German B S L T R K
Hungarian G H F B V K
Italian P C A T D R
Norwegian B S L T D K
Polish P S G W H K
Portuguese P C B T D R
Romanian P C N T D R
Spanish P C A T D R
Swedish B S L T D K
PGN: EOF